PDF documents can be uploaded individually or in bulk, with or without using Upload Profile folders to allow for integration and automation.
Upload Profile folders are used to automate the uploading of PDF documents to the database. Each PDF that is created in or copied into an Upload Profile folder will have defined Information Properties written to a clone of the PDF and an XML containing external properties in the %APPDATA%\Robert F. Frasca\PDFKeeper\UploadStaging folder. After staging, all PDF documents in the UploadStaging folder will be uploaded to the database.
Each document is stored in the database as follows:
ID – Assigned to the document by the database.
Title – Extracted from the PDF; this column is searchable.
Author – Extracted from the PDF; this column is searchable.
Subject – Extracted from the PDF; this column is searchable.
Keywords – Extracted from the PDF; this column is searchable.
Added – Date and time the document was added to the database; this column is searchable.
Notes – User editable notes; this column is searchable.
PDF – The PDF document; this column is searchable except when using the local database option.
Category – Optional category of the document; this column is searchable.
Flag – Flag state; 0 = not flagged, 1 = flagged.
Tax Year – Optional tax year of the document; this column is searchable.
Text Annotations – Extracted from the PDF; this column is searchable.
Text – Extracted from the PDF; this column is searchable.
Additional Information:
USER password protected PDF documents are not supported by PDFKeeper and will be rejected by the upload process. If you have the legal right to remove the password, you can find plenty of information on the Internet to assist you.
Each PDF created in or copied into an Upload folder that is not associated with an Upload Profile must contain a Title, Author, and Subject; otherwise, the PDF will be rejected by the upload process.
Annotations in a PDF document are not filtered by Oracle Database; however, text annotations are extracted by PDFKeeper.
Embedded fonts in a PDF document are not filtered correctly by Oracle Database; however, text from these PDF documents will be extracted by PDFKeeper.
Text extraction will be skipped for each image page or mixed image and text page to be processed by OCR when the pixel width or pixel height exceeds the maximum image pixel dimensions supported by the Windows OCR engine.
The upload process will check for PDF documents in %APPDATA%\Robert F. Frasca\PDFKeeper\Upload that need to be uploaded every 15 seconds.
Each PDF in the Upload folder will be deleted to the Windows Recycle Bin after being staged.
It is NOT recommended to configure the Windows Recycle Bin to remove files immediately when deleted.
Before restoring PDF documents from the Recycle Bin back to the Upload folder, make sure to close PDFKeeper. After restoring, move the file(s) out of the Upload folder to prevent PDFKeeper from uploading the PDF documents after PDFKeeper has been restarted.